Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(vats): fix promise space reset() misbehavior #7710

Merged
merged 5 commits into from
May 12, 2023

Conversation

warner
Copy link
Member

@warner warner commented May 12, 2023

Previously, the reset() feature misbehaved in a common use pattern, where both the reset and resolve facets were extracted at the same time. The state was nominally held in nameToState.get(name), but was sometimes used by closed-over state and pk variables. Which one you got depended upon when the Proxy get trap was called. If resolve was fetched early and retained across a reset() call, then resolve() would resolve the old promise. Later, when consume was used, the space would create a new Promise to satisfy the request, which would never be resolved.

This changes the implementation to strictly keep/use the state in nameToState, and allow all resolve/reject/reset methods to work the same way no matter when they were retrieved (they close over name but not any state).

closes #7709

@warner
Copy link
Member Author

warner commented May 12, 2023

@dckc please run your init/upgrade-proposal test against this branch and see if it behaves better. I think the symptom of the buggy promise-space should have been that the consume.myInstance promise would never resolve, so the new vat would never be upgraded. But I know some of the test runs had different symptoms: let's make sure those were artifacts of the test changes we were making and not some new/independent problem.

@michaelfig please make sure this matches your original intentions for how reset() should work.

@erights You mentioned that the Proxy could be safer, but it wasn't obvious to me what should be done.. let me know what should be changed.

Concerns:

  • Is remaining cleared at the right time? Previously reset() would clear it, but the .finally did too, and I think a produce.reset(reason); const { foo } = consume; would incorrectly remove foo from remaining shortly after the consume.foo get trap re-added it. The new logic should not have that problem, but I might have managed to introduce a new one.
  • Am I calling the hooks at the right time?
  • I renamed one unit test (makePromiseSpace backed by store) to say "copied by store", as far as I can tell, the store is never read from. To be "backed by" I'd expect the data to be stored primarily in the store, which would mean that makePromiseSpace(store) would read pre-existing store contents (e.g. from an earlier space), and that would require something like a for (const [k,v] of store.entries()) { produce[k].resolve(v); }. Might be a neat feature, maybe for a future replacement vat-bootstrap, but not what we've got so far. @michaelfig does that seem accurate?

@warner warner self-assigned this May 12, 2023
@dckc
Copy link
Member

dckc commented May 12, 2023

destructure twice work-around: no joy

Before trying out this proposed fix, I thought this idea was worth a try:

he could rewrite init-proposal.js to destructure produce twice, to work around the promise-space bug,

The rewrite is:

  • 2023-05-12 09:15 9810300d1 WIP: work around promiseSpace.reset()

I ran scripts/start-local-chain.sh from packages/inter-protocol as usual before running AGORIC_NET=local ./test/upgrade-contract/contract-upgrade-smoketest.sh from packages/agoric-cli.

Proposals 1 and 2 failed due to unrelated errors

My first smoke test attempt failed because I forgot to sufficiently fund the account that was supposed to upload the bundle. (The smoke test script seems to not handle this failure mode):

  1. "2023-05-12T13:45:00.174512787Z","PROPOSAL_STATUS_PASSED"
2023-05-12T13:45:05.353Z SwingSet: xsnap: v10: Error: bundleID not yet installe
d: b1-2a3be684d00d708cbaf7503e0ad6a1894736946122f1fd2180d9b9c33ad93094bffe208db
0a8a13acfcc4c71810d143329da11eae8cf8d127578420842f758e6

The second one failed due to a typo: I wrote instance.myInstance.reset() where I meant instance.produce.myInstance.reset():

  1. "2023-05-12T13:46:50.476782586Z","PROPOSAL_STATUS_PASSED"]
2023-05-12T13:47:00.704Z SwingSet: xsnap: v10: RemoteError(error:liveSlots:v1#70002)#2: "myInstance" not permitted, only ["produce"]

init, upgrade proposals succeed on 1st smoke test

  1. "2023-05-12T13:50:31.044262437Z","PROPOSAL_STATUS_PASSED"
  2. "2023-05-12T13:51:36.203487803Z","PROPOSAL_STATUS_PASSED"
2023-05-12T13:51:42.302Z SwingSet: kernel: vat v46 upgraded from incarnation 0 
to 1 with source b1-f9b241fdac20a141e2946d863584ea78fce6b3b4b198f41bb7919b03790
48a6a0a90eb6cd98ca0529ce74b38fa2b425ebd216043fb5e6ea096b0e863c65ffefa

upgraded behavior seems to work. note numWantsSatisfied: 1, capitalized Free Tokens!!! with 3 !s

$ agops inter my make --from gov1
Object [Alleged: BoardRemoteInstanceHandle] { getBoardId: {} }
2023-05-12T13:54:31.790896575Z tx not in block 307 retrying...
2023-05-12T13:54:31.790896575Z tx not in block 307 retrying...
{"height":"308","txhash":"DD47D7EA68E4427BE640CDCF7959630552ED0730C1566444A4E5A07D77262C86"}
{"id":"my-1683899677507","invitationSpec":{"instance":{},"publicInvitationMaker":"makeInvitation","source":"contract"},"numWantsSatisfied":1,"proposal":{"want":{"Tokens":{"brand":{},"value":"32"}}},"result":"Congratulations, Free Tokens!!!"}

ps shows:

xsnap-worker v46:zcf-b1-1b57f-myContract -r @8:v46-14

Unfortunately, in an attempt to use DEBUG=label-instances, I suppressed all logging from vats in this run.
cosmic-swingset/Makefile shows DEBUG ?= SwingSet:ls,SwingSet:vat. I aim to try that next time.

upgrade affects wrong vat on 2nd smoke test

I paused the smoke test script after PROPOSAL_STATUS_VOTING_PERIOD to let me interact with the "broken" contract before upgrading this time.

  1. "2023-05-12T14:03:23.076929024Z","PROPOSAL_STATUS_PASSED"

as expected, broken contract behavior observed: no numWantsSatisfied, lowercase free tokens! with 1 !.

$ agops inter my make --from gov1
Object [Alleged: BoardRemoteInstanceHandle] { getBoardId: {} }
2023-05-12T14:04:43.415637069Z tx not in block 429 retrying...
{"height":"430","txhash":"064C2680487155A0F4211D4421B64964A0C9BDA91A9D9A4226D4AF708362BAFF"}
2023-05-12T14:04:48.429625996Z offer not in wallet at block 430 retrying...
{"id":"my-1683900293221","invitationSpec":{"instance":{},"publicInvitationMaker":"makeInvitation","source":"contract"},"numWantsSatisfied":0,"payouts":{"Tokens":{"brand":{},"value":"0"}},"proposal":{"want":{"Tokens":{"brand":{},"value":"32"}}},"result":"Congratulations, free tokens!"}

Then I let the smoke test resume upgrading:

  1. "2023-05-12T14:06:58.677453841Z","PROPOSAL_STATUS_PASSED"]

but note that the upgrade affected v46, the vat from the 1st smoke test run:

2023-05-12T14:07:05.020Z SwingSet: kernel: vat v46 upgraded from incarnation 1 
to 2 with source b1-f9b241fdac20a141e2946d863584ea78fce6b3b4b198f41bb7919b03790
48a6a0a90eb6cd98ca0529ce74b38fa2b425ebd216043fb5e6ea096b0e863c65ffefa

ps shows

v46:zcf-b1-1b57f-myContract
v47:zcf-b1-1b57f-myContract -r @8:v47-4

@warner
Copy link
Member Author

warner commented May 12, 2023

I don't understand the "Chain deployment test" failure. It feels like the client somehow halted around https://github.com/Agoric/agoric-sdk/actions/runs/4956832069/jobs/8867776209?pr=7710#step:10:6894 (maybe while trying to provision the wallet?) , since no progress seems to be made after that point, but I don't see any errors other than the "peer connection lost", and I'm not even sure the client would have been treated as a peer. I need to compare this output against a successful run, and see where they diverge.

@dckc
Copy link
Member

dckc commented May 12, 2023

please run your init/upgrade-proposal test against this branch and see if it behaves better.

it does!

cherry-picking this fix into mfig-contract-upgrade-e2e gets us:

  • 2023-05-12 01:07 2a242bd fix(vats): fix promise space reset() misbehavior

repeating the experiment above, on the 2nd smoke test, we get the correct vat upgraded:

2023-05-12T15:22:01.057Z SwingSet: kernel: vat v47 upgraded from incarnation 0 to 1 with source b1-f9b241fdac20a141e2946d863584ea78fce6b3b4b198f41bb7919b0379048a6a0a90eb6cd98ca0529ce74b38fa2b425ebd216043fb5e6ea096b0e863c65ffefa

and correct upgraded contract behavior:

$ agops inter my make --from gov1
Object [Alleged: BoardRemoteInstanceHandle] { getBoardId: {} }
2023-05-12T15:25:15.527007001Z tx not in block 397 retrying...
2023-05-12T15:25:20.539637052Z tx not in block 398 retrying...
{"height":"398","txhash":"462721C3E27CEE5F0F31AB5F1BF91A68B044895AA660287EF727D300D1967ED2"}
{"id":"my-1683905122857","invitationSpec":{"instance":{},"publicInvitationMaker":"makeInvitation","source":"contract"},"numWantsSatisfied":1,"proposal":{"want":{"Tokens":{"brand":{},"value":"32"}}},"result":"Congratulations, Free Tokens!!!"}

p.s. running the smoke test a 3rd time works too:
gov-q:

["5","2023-05-12T15:36:01.975828598Z","PROPOSAL_STATUS_PASSED"]
["6","2023-05-12T15:41:22.717743466Z","PROPOSAL_STATUS_PASSED"]

chain log:

2023-05-12T15:41:28.840Z SwingSet: kernel: vat v48 upgraded from incarnation 0 to 1 with source b1-f9b241fdac20a141e2946d863584ea78fce6b3b4b198f41bb7919b0379048a6a0a90eb6cd98ca0529ce74b38fa2b425ebd216043fb5e6ea096b0e863c65ffefa

smart wallet interaction with upgraded contract:

$ agops inter my make --from gov1
Object [Alleged: BoardRemoteInstanceHandle] { getBoardId: {} }
2023-05-12T15:42:22.974407921Z tx not in block 602 retrying...
{"height":"603","txhash":"0FAD298B3CA295C9AC2921E4B89449C245705FE755B8F28709EA756DEEE64326"}
{"id":"my-1683906151298","invitationSpec":{"instance":{},"publicInvitationMaker":"makeInvitation","source":"contract"},"numWantsSatisfied":1,"proposal":{"want":{"Tokens":{"brand":{},"value":"32"}}},"result":"Congratulations, Free Tokens!!!"}

Copy link
Member

@dckc dckc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests look good, including backed by -> copied into

test('makePromiseSpace backed by store', async t => {
test('makePromiseSpace copied into store', async t => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@dckc dckc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new implementation is... slightly less puzzling to me than the old one :)

That is: this should wait for review by @michaelfig . But if he's not available, I think it should land; I'm willing to help maintain it.

packages/vats/src/core/promise-space.js Outdated Show resolved Hide resolved
packages/vats/src/core/promise-space.js Outdated Show resolved Hide resolved
packages/vats/src/core/promise-space.js Outdated Show resolved Hide resolved
packages/vats/src/core/promise-space.js Outdated Show resolved Hide resolved
@warner warner requested a review from erights May 12, 2023 18:36
Copy link
Member

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 62-66

      if (isPromise(valueP)) {
        void valueP.then(value => save(name, value));
      } else {
        save(name, valueP);
      }

is NOT reentrancy safe, because isPromise does not check that valueP is a safe promise. IOW, it may still have an own malicious then method. Could check for a safe promise (`passStyleOf(valueP) === 'promise'), but better to just do

      if (isPromise(valueP)) {
        void E.when(valueP, value => save(name, value));
      } else {
        save(name, valueP);
      }

Even better in general would be

      void E.when(valueP, value => save(name, value));

unless there's a reason you need the synchronous shortcut. Do you?

Icing on the cake is that E.when knows how to do deep stack turn tracking.

@warner
Copy link
Member Author

warner commented May 12, 2023

On line 62-66

      if (isPromise(valueP)) {
        void valueP.then(value => save(name, value));
      } else {
        save(name, valueP);
      }

is NOT reentrancy safe, because isPromise does not check that valueP is a safe promise. IOW, it may still have an own malicious then method. Could check for a safe promise (`passStyleOf(valueP) === 'promise'), but better to just do

      if (isPromise(valueP)) {
        void E.when(valueP, value => save(name, value));
      } else {
        save(name, valueP);
      }

Even better in general would be

      void E.when(valueP, value => save(name, value));

unless there's a reason you need the synchronous shortcut. Do you?

I don't know. Given what I can glean from the code, callers (behavior functions) might resolve() with a real value (frequently a Presence or a record with Presences), or a Promise for such. If they use a Promise, then.. huh, it could hang out for an undetermined time before finally settling, which makes the removal of the name from remaining feel a bit premature. The original version only removed from remaining upon .finally (or reset()), and now I think that I shouldn't have changed that part (remaining is only used for debugging, but if someone does produce.foo.resolve(new Promise(() => 0)), it'll hang forever but not report foo as still being pending).

Ok, so I should really move the remaining.delete into the .finally, and I think the corresponding thing for the onResolve hook is to not need the synchronous shortcut, as you suggested. So I'll also change it to E.when.

@michaelfig I could really use your review on this, I want to make sure I didn't violate any of your other intentions like I did with remaining.

Icing on the cake is that E.when knows how to do deep stack turn tracking.

Copy link
Member

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note. Feel free to ignore this round, but capturing while I see it

The code on 90-94 is correct, but I'd rewrite it as

  const { log = noop, store = undefined, hooks = undefined } = opts;
  const logHooks = store
    ? makeStoreHooks(store, log)
    : hooks || makeLogHooks(log);
  const { onAddKey, onSettled, onResolve, onReset } = logHooks;

except that TS complains about it in a way that makes no sense to me. Certainly not worth untangling the TS problem today, so ignore for now.

Copy link
Member

@erights erights left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

For the record, I do not thoroughly understand the code yet, so I'm really approving to remove my "request changes" block, enabling you to act on your existing approvals.

Everything I understood looks good, and I see no red flags. But this is certainly a weird abstraction that I'd like to come back to later to thoroughly understand. It looks important!

@warner warner enabled auto-merge May 12, 2023 19:31
@warner warner added this pull request to the merge queue May 12, 2023
@michaelfig michaelfig removed this pull request from the merge queue due to a manual request May 12, 2023
packages/vats/package.json Outdated Show resolved Hide resolved
packages/vats/src/core/promise-space.js Outdated Show resolved Hide resolved
packages/vats/src/core/promise-space.js Outdated Show resolved Hide resolved
packages/vats/src/core/promise-space.js Show resolved Hide resolved
Previously, the reset() feature misbehaved in a common use pattern,
where both the `reset` and `resolve` facets were extracted at the same
time. The state was nominally held in `nameToState.get(name)`, but was
sometimes used by closed-over `state` and `pk` variables. Which one
you got depended upon when the Proxy `get` trap was called. If
`resolve` was fetched early and retained across a `reset()` call, then
`resolve()` would resolve the *old* promise. Later, when `consume` was
used, the space would create a new Promise to satisfy the request,
which would never be resolved.

This changes the implementation to strictly keep/use the state in
`nameToState`, and allow all resolve/reject/reset methods to work the
same way no matter when they were retrieved (they close over `name`
but not any state).

closes #7709
per @erights: this prevents non-get operations from mutating the
target and using it as a communications channel.
This is more defensive against sneaky promises, at the cost of
delaying storage by a turn for non-promises.
Co-authored-by: Michael FIG <mfig@agoric.com>
@warner warner enabled auto-merge May 12, 2023 20:17
@warner warner added this pull request to the merge queue May 12, 2023
Merged via the queue into master with commit 2d07ae9 May 12, 2023
@warner warner deleted the 7709-promise-space-reset branch May 12, 2023 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

contract upgrade proposals fragile due to promiseSpace reset strange behavior
5 participants